Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism
نویسندگان
چکیده
Mining association rules with multiple minimum supports is an important generalization of the association-rule-mining problem, which was recently proposed by Liu et al. Instead of setting a single minimum support threshold for all items, they allow users to specify multiple minimum supports to reflect the natures of the items, and an Apriori-based algorithm, named MSapriori, is developed to mine all frequent itemsets. In this paper, we study the same problem but with two additional improvements. First, we propose a FP-tree-like structure, MIS-tree, to store the crucial information about frequent patterns. Accordingly, an efficient MIS-tree-based algorithm, called the CFP-growth algorithm, is developed for mining all frequent itemsets. Second, since each item can have its own minimum support, it is very difficult for users to set the appropriate thresholds for all items at a time. In practice, users need to tune items’ supports and run the mining algorithm repeatedly until a satisfactory end is reached. To speed up this time-consuming tuning process, an efficient algorithm which can maintain the MIS-tree structure without rescanning database is proposed. Experiments on both synthetic and real-life datasets show that our algorithms are much more efficient and scalable than the previous algorithm. D 2004 Elsevier B.V. All rights reserved.
منابع مشابه
A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملDiscovering high utility itemsets with multiple minimum supports
Generally, association rule mining uses only a single minimum support threshold for the whole database. This model implicitly assumes that all items in the database have the same nature. In real applications, however, each item can have different nature such as medical datasets which contain information of both diseases and symptoms or status related to the diseases. Therefore, association rule...
متن کاملFast Algorithm for Mining Generalized Association Rules
In this paper, we present a new algorithm for mining generalized association rules. We develop the algorithm which scans database one time only and use Tidset to compute the support of generalized itemset faster. A tree structure called GIT-tree, an extension of IT-tree, is developed to store database for mining frequent itemsets from hierarchical database. Our algorithm is often faster than MM...
متن کاملAn improved approach to find membership functions and multiple minimum supports in fuzzy data mining
Fuzzy mining approaches have recently been discussed for deriving fuzzy knowledge. Since items may have their own characteristics, different minimum supports and membership functions may be specified for different items. In the past, we proposed a genetic-fuzzy data-mining algorithm for extracting minimum supports and membership functions for items from quantitative transactions. In that paper,...
متن کاملMining association rules with multiple minimum supports using maximum constraints
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most of the previous approaches set a single minimum support threshold for all the items or itemsets. But in real applications, different items may have different criteria to judge its importance. The support requirements should then vary with different items. In t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Decision Support Systems
دوره 42 شماره
صفحات -
تاریخ انتشار 2006